Skip to content

feat: Add scorer that exposes helpers to evaluate agents#2146

Open
Luca Forstner (lforst) wants to merge 3 commits into
mainfrom
lforst/eval-run-assertions
Open

feat: Add scorer that exposes helpers to evaluate agents#2146
Luca Forstner (lforst) wants to merge 3 commits into
mainfrom
lforst/eval-run-assertions

Conversation

@lforst

@lforst Luca Forstner (lforst) commented Jun 19, 2026

Copy link
Copy Markdown
Member
import { Eval, agentAssertionScorer } from "braintrust";
import { z } from "zod";

await Eval("agent-behavior", {
  data: () => [],
  task: async (input) => {},
  scores: [
    agentAssertionScorer(({ output, expected, assert }) => [
      // Tool helpers
      assert.calledTool(
        "web_search",
        {
          input: { query: /capital of Estonia/i },
          times: 1,
        },
        "searches for the answer once",
      ),
      assert.calledTool(
        "summarize_source",
        {
          output: { citations: (value) => Array.isArray(value) },
          isError: false,
        },
        "summarizes sources successfully",
      ),
      assert.notCalledTool("send_email", "does not send email"),
      assert.toolOrder(
        ["web_search", "summarize_source"],
        "searches before summarizing",
      ),
      assert.maxToolCalls(3, "keeps tool use bounded"),
      assert.usedNoTools("does not need tools for memorized fact"),

      // Generic assertion helpers
      assert.contains(output, "Tallinn", "answers directly"),
      assert.equals(output.answer, expected.answer, "exact expected answer"),
      assert.notEquals(output.answer, "I don't know", "does not punt"),
      assert.contains(output.answer, /Tallinn/i, "mentions Tallinn"),
      assert.matches(
        output,
        z.object({
          answer: z.string(),
          citations: z.array(z.string()).min(1),
          confidence: z.number().min(0).max(1),
        }),
        "returns the expected shape",
      ),
    ]),
  ],
});

scorer output


{
  name: "assertions",
  score: passedAssertions / totalAssertions,
  metadata: {
    assertions: [
      { name: "routes to expected department", passed: true },
      { name: "returns valid route shape", passed: false },
      { name: "called tool classify_ticket", passed: true },
    ],
    failed: [
      "returns valid route shape: expected output to match RouteSchema",
    ],
  },
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant